Intensity and coherence of motifs in weighted complex networks 
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The local structure of unweighted networks can be characterized by the number of times a sub- 
graph appears in the network. The clustering coefficient, reflecting on the local conflguration of 
triangles, can be seen as a special case of this approach. In this Letter we generalize this method for 
weighted networks. We introduce subgraph intensity as the geometric mean of its link weights and 
coherence as the ratio of the geometric to the corresponding arithmetic mean. Using these measures, 
motif scores and clustering coefficient can be generalized to weighted networks. To demonstrate these 
concepts, we apply them to financial and metabolic networks and find that inclusion of weights may 
considerably modify the conclusions obtained from the study of unweighted characteristics. 

PACS numbers: 89.75.-k, 89.75.Hc, 89.65.-s, 87.16.Ac 



The network approach to complex systems has turned 
out to be extremely fruitful and it has revealed some 
general principles appHcable to a large number of sys- 
tems. Studies have produced unexpected findings such as 
the ubiquity of scale freeness, the frequent appearance of 
high clustering, and the relationship between functional- 
ity and the high appearance frequency of specific motifs. 
This approach has also led to a number of novel paradig- 
matic models, providing a holistic framework in which 
the details of the interactions between the constituents 
of the complex systems are disregarded and only their 
scaffolds are considered 

A deeper understanding of these systems requires that, 
in addition to the underlying network structure, informa- 
tion about the strength of interactions is also taken into 
account. This is accomplished by assigning weights to the 
links, such as transportation fiuxes in the Internet and air 
traffic networks or the reaction fiuxes building the 

metabolic pathways of a cell 0|. Weights can also be ob- 
tained by applying a classification (or clustering) scheme 
to a correlation matrix, or for understandiiig the struc- 
ture underlyingthe dynamics of microarray [5] and stock 
market dataja,l3- Optimal paths i^] and minimum span- 
ning trees [31 also clearly depend on the distribution of 
weights. These examples indicate the need to generalize 
the network characteristics to weighted networks. Some 
recent efforts towards this goal are the discussion of the 
clustering coefficient for node weights [l(M . introduction 
of a definition for the Hnk weighted case 0, an d the 
mapping of weighted networks to multigraphs Our 
aim in this Letter is to introduce a set of practical tools 
that may be used to study the structure of a diverse group 
of systems where interactions strengths can be obtained 
and where omitting them would lead to a considerable 
loss of information. Many biological and social systems 
are expected to fall into this category. 

In general, we consider any weighted network as a 
fully connected graph where some of the links bear zero 
weights. For simplicity, we deal with (directed or undi- 
rected) networks where the weight Wij between nodes i 
and j is non-negative and not necessarily normaHzed. We 
introduce the intensity I{g) of subgraph g with vertices 



Vg and links £g as the geometric mean of its weights: 
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where \£g 



is the number of links in £g. The definition 
suggest a shift in perspective from regarding subgraphs 
as discrete objects (either exist or not) to a continuum 
of subgraph intensities, where zero or very low intensity 
values imply that the subgraph in question does not exist 
or exists at a practically insignificant intensity level. In 
practice, low intensity values could result, for example, 
from measurement noise. 

Due to the nature of the geometric mean, the subgraph 
intensity I{g) may be low because one of the weights is 
very low, or it may result from all of the weights being 
low. In order to distinguish between these two extremes, 
we introduce subgraph coherence Q(g) as the ratio of the 
geometric to the arithmetic mean of the weights as 
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Here Q G [0,1] and it is close to unity only if the subgraph 
weights do not differ much, i.e. are internally coherent. 

The concept of a motif was originally introduced to 
denote "patterns of interconnections occurring in com- 
plex networks at numbers that are significantly higher 
than those in randomized networks" ^j^]. However, this 
has led to some confusion, which partly stems from the 
specification of the random ensemble, i.e. the underly- 
ing null hypothesis ^jj. We define a motif as a set (en- 
semble) of topologically equivalent subgraphs of a net- 
work. With weighted networks it becomes more natural 
to deal with intensities as opposed to numbers of occur- 
rence, where the latter is obtained as a special case of 
the former. The motifs showing statistically significant 
deviation from some reference system can then be called 
high or low intensity motifs. 

We define the total intensity Im of a motif M in the 
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network as the sum of its subgraph intensities Im = 
^geM ^id)- certain weighted directed motifs, the 

total intensities can be computed using simple matrix 
operations. Let the N x N weight matrix W describe 
the network weights. Analogously, let A represent the 
underlying N x N adjacency matrix such that = 1 
if Wij > 0, and = if wij =0. In an unweighted 
network, the number of directed paths returning to the 
starting node after k steps can be written as 



(3) 



where the summation goes over all possible sites and 
ik+i = h Q- Let W^^/'^) represent a matrix obtained 



from W 



by taking the fc-th root of its individual 



elements such that W^^/*^) = 
of motif M in the network is 
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The total intensity 
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where om is a combinatorial factor ensuring that each 
subgraph is counted only once. For example, for the 
non-frustrated triangle (Fig. El middle column) the total 
intensity becomes /a = iTrKW'^/"^^)^}. A change in 
the direction of a link can be taken into account using 
the matrix transpose. For some motifs, such as the path 
of order 2 (Fig. left column) we need a "block" matrix 
B — \bij] to prevent us from double-counting subgraphs. 
In this matrix the diagonal elements ha = and for the 
non-diagonal elements 6y = when = 1 or aji = 1, 
and otherwise hij = 1. This allows us to write the total 
intensity of the path motif as 1 1_ = Tr{W(i/2)w(i/2)B}. 
We prevent double counting here for reasons of compati- 
bility with earlier work, but find that it poses no serious 
problem as long as the system of counting is systemat- 
ically applied both in the empirical and random case. 
Double counting could, in fact, be desirable if the inter- 
action strength measurements are noisy. Envision adding 
a small number e to every link (including the zeros) to 
represent a noise component. Larger subgraphs may now 
simply consist of noise. 

In the z-score for studying the statistical signifi- 
cance of motif occurrences was defined as 



ZM = (Nm - {nM))/crM, 



(5) 



where Nm is the number of subgraphs in motif M in 
the empirical network and (um) is the expectation of 
their number in the reference ensemble, and (Tm is the 
standard deviation of the latter. Replacing the number 
of subgraphs by their intensities generalizes the z-score 
to motif intensity score 



ZM = {Im - {hi))/{{il[) 
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where im is the total intensity of motif M in one real- 
ization of the reference system. It is clear that Eqs. © 
and (O coincide for binary weights, implying that z ^ z 
in the limit. As an analogue to the motif intensity score, 
we introduce the motif coherence score as 



z'm = {Qm - {qM))/{{qli) - {qM?Y'\ 



(7) 



where Qm and qM are the total coherence for motif M 
in the empirical network and in one realization of the 
reference system, respectively. As the coherence of an 
unweighted subgraph is unity, also z' ^ 2 as the weights 
become binary. 

Triangles are among the simplest nontrivial motifs and 
they play an important role as one of the basic quanti- 
ties of network characterization in defining the clustering 
coefficient Ci at node i as 
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where ki is the degree of node i and ti is the number 
of triangles attached to the node P, 0| . This quantity 
is normalized between and 1, and it characterizes the 
tendency of the nearest neighbors of node i to be inter- 
connected. 

As triangles are one type of subgraph, the definition 
in Eq. Q may be used to yield the weighted clustering 
coefficient Ci by replacing the number of triangles ti in 
Eq. ^ with the sum of triangle intensities as 



kiiki 
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where we use weights scaled by the largest weight in the 
network, Wij = Wij / iiiax.{wij) . This definition fulfills the 
requirement that Ci Ci as the weights become binary. 
We can relate the unweighted and weighted clustering 
coefficients through the average intensity of triangles at 
node i as Ii — J- J2gej\f(v ) ^(d)^ where J^{vi) denotes the 
neighborhood of node i, and this allows us to write the 
weighted clustering coefficient as 



Ci — liCi- 



(10) 



This equation gives a plausible interpretation of the 
weighted clustering coefficient: It is the unweighted 
(topological) clustering coefficient renormalized by the 
average intensity. Naturally, a weighted clustering coef- 
ficient C'i can also be formulated by renormalizing the 
unweighted coefficient by the average coherence Qi, in- 
stead of the average intensity Ii, around node i. 

An alternative definition for weighted clustering coef- 
ficient was given in ^3] as 
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Figure 1: A schematic illustration of the difference between 
Ci and d. The weight Wjk is gradually decreased from left 
to right. The value of d is equal for the first three triangles 
and drops to zero suddenly for the fourth triangle as Wj^ 0, 
implying that ajk = 0. In contrast, the value of d decreases 

1/3 

as Ci ~ , tending smoothly to zero in the limit. 



where Si denotes the strength of node i, defined as 
Si = J2j '^ij ) a-ncl '^ij is an element of the underlying 
binary adjacency matrix. This definition considers only 
two of the three Hnk weights, namely those adjacent to 
node i (wij and Wik) and requires that a link exist also 
between nodes j and k but does not take its weight {wjk) 
into account. The difference between the two weighted 
clustering coefficients Ci and Ci is illustrated schemati- 
cally in Fig. n 

Next we apply these concepts to two real networks. 

(A) Undirected financial network. We considered a set 
of daily price data for N = 477 NYSE traded stocks 
from 1980 to 2000. We calculated the correlation matrix 
by extracting 4-year return windows in order to study 
the system's dynamics. Here the nodes correspond to 
stocks, and the weighted undirected links to the ele- 
ments in the correlation matrix. Thus, the stronger the 
weight, the stronger the coupHng between the stock re- 
turns in terms of their linear correlation. The links are 
inserted in the network in descending order starting from 
the strongest one until a predetermined number of links 
has been reached. The method is described in detail in 
i 

We have shown earlier that the famous Black Monday 
(10/19/1987) causes a temporary transition not only in 
the topology but also in the weights of the network [lfl |. 
Our aim is to use it as an example of a network un- 
dergoing this type of two-fold transition (topology and 
weights) and to see whether the changes are reflected in 
the network's clustering statistics. In Fig. |2lwe show the 
three clustering coefficients, averaged over the network, 
as functions of time: the unweighted C of Eq. 0, the 
weighted C introduced in Q and given in Eq. Ijllll . and 
the weighted C introduced in Eq. ©. 

The crash is not seen very clearly in C, as it can only 
capture the topological aspects of the transition. The 
weighted coeflicient C is also fairly insensitive to the 
changes in link weights and practically coincides with C. 
The fact that C does reflect the transition indicates its 
ability to capture both aspects of the transition. The av- 
erage values for the clustering coeflicients outside (inside) 
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Figure 2: Average clustering coefficients for the financial net- 
work. The weighted clustering coefficient C (+) of Eq. lO 
shows the effect of Black Monday clearly. The unweighted C 
(□) of Eq. 101 and the weighted (7 (x) of Eq. QlJ practically 
coincide (the markers □ and x are used alternately). 

the crash 'period'^^are C = 0_.57 (C = 0.60), C = 0.58 
{C = 0.60), and C = 0.36 (C = 0.50). These numbers 
imply that C and C increase less than 5% during the 
crash which is less than their normal (outside the crash 
period) fluctuation, measured at 6.2% as their standard 
deviation relative to the mean. However, the crash in- 
creases C by 39%, which is considerably larger than the 
the level of fluctuation at 9.7%. Thus, C has a consid- 
erably higher "signal-to-noise" ratio. The results are not 
affected significantly by the value of the predetermined 
threshold. In the limit of inserting all the links of the 
correlation matrix, we obtain a fully connected network 
for which C = C* = 1 for all times, whereas C still shows 
the effect of the crash clearly. 

(B) Directed metabolic network. Cellular metabolism 
can be represented as a directed network of intracellular 
molecular interactions. The network consists of nodes 

, Yj , which represent the chemicals and they are linked 
if connected by a metabolic reaction. Here we focus on 
the metabolic pathways of the bacterium Escherichia coli 
grown in glucose, which has been studied intensely 0|. 
In order to experiment with weighted directed motifs, we 
define the weights through a biochemical reaction of the 
form xiXi-\- ■ ■ ■ XnXn ^ yiYi + ■ ■ ■ y-mYm with a positive 
(negative) net flux / if the balance of the reaction lies 
to the right (left). The flux provides an overall measure 
of the relative activity of each reaction. We define the 
weights as Wij = {yj/xi)f, reflecting the rate at which 
Xi is converted into Yj. 

In order to employ motif intensity scores a reference 
system, corresponding to a null hypothesis, needs to be 
established. We follow a typical approach by construct- 
ing an ensemble of random networks by conserving the 
degree sequence of the empirical network using a switch- 
ing algorithm which preserves the single-node char- 
acteristics of the empirical network. The weights are ob- 
tained simply by permuting the empirical weights. While 
removing any weight correlations, the approach guaran- 
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Figure 3: Motif intensities for the empirical network (vertical 
lines) and the corresponding random ensembles (histograms), 
for the unweighted (upper panel) and weighted (lower panel) 
cases. 



tees conservation of the empirical weight distribution. 

We summarize our findings in Fig. in which we 
show the unweighted and weighted motif intensities for 
a subset of the studied motifs: (i) path of order 2, 
(ii) non-frustrated triangle, and (iii) frustrated triangle. 
The motif intensity scores for the unweighted networks, 
which are based on the subgraph counts, are Zi = —5.4, 
Zii = 12.8, and zm = —0.5, and for the weighted net- 



works Zi = 14.8, Zii = 33.8, Ziii = 9.0. These results 
show that a move from unweighted to weighted charac- 
teristics can cause a change from low to high intensity, i.e. 
from under-representation to over-representation. The 
intensity may become amplified, i.e. increase the extent 
of over-representation, or it may increase from average 
to high intensity, i.e. from statistically insignificant to 
over-representation. 

In this Letter we have proposed two new concepts for 
the characterization of weighted complex networks: the 
intensity and coherence of a subgraph. They allow for 
a very natural generalization of the z-scores to motif in- 
tensity scores (Eqs. ElandU}, and the clustering coef- 
ficient to weighted clustering coefficient (Eq. Ilfl|l . Our 
studies with undirected financial networks show that the 
weighted clustering coefficient reffects the effects of a 
market crash which is hardly observed with other cluster- 
ing characteristics studied. Our results on the directed 
metabolic network of E. CoH indicate that incorporation 
of weights into network motifs may considerably modify 
the conclusions drawn from their statistics. 
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